---
title: "Concepts"
---

import SvgStreams from './streams.svg';
import SvgStreamInput from './stream-input.svg';
import SvgStreamOutput from './stream-output.svg';
import SvgStreamTransform from './stream-transform.svg';
import SvgPipelineFilters from './pipeline-filters.svg';
import SvgSubPipeline from './sub-pipeline.svg';
import SvgModulePipelineContext from './module-pipeline-context.svg';

## Stream

Pipy is a proxy that works like a _"stream processor"_. It gulps in _streams_, processes _streams_, and spews out _streams_.

### Inbound and outbound

Streams coming from a client (or downstream) are called _inbound streams_. Streams going to a server (or upstream) are called _outbound streams_.

<div style="text-align: center">
  <SvgStreams/>
</div>

Both inbound side and outbound side have input and output streams:

* On the inbound side, input streams go from the client to Pipy and output streams go from Pipy back to the client.
* On the outbound side, output streams go from Pipy to the server and input streams go from the server back to Pipy.

### Events

A Pipy stream is a series of **events**. There are four types of events:

* Data
* MessageStart
* MessageEnd
* StreamEnd

An input stream coming from outside of Pipy is a series of **Data** events, ended by a **StreamEnd** event. Each **Data** event holds a chunk of bytes received from the I/O.

For example, a typical HTTP request would be like this:

<div style="text-align: center">
  <SvgStreamInput/>
</div>

What Pipy does is process the events in the input stream. Some are transformed, some are discarded. New events can be inserted too. Those new events also include other types of events besides **Data** and **StreamEnd**, namely **MessageStart** and **MessageEnd**. These non-Data events are used inside Pipy as _"markers"_ to give the original raw bytes some higher-level meanings for the business logic you are running.

For example, the above input stream would be decoded into an _HTTP request message_ wrapped between a pair of **MessageStart** and **MessageEnd** events. That message is transformed into an _HTTP response message_ and encoded into a series of **Data** events before going to the output.

<div style="text-align: center">
  <SvgStreamTransform/>
</div>

Eventually, after all the processing, the stream of events are sent to an output. At this point, **MessageStart** and **MessageEnd** events are discarded, only **Data** and **StreamEnd** events are spewed out.

<div style="text-align: center">
  <SvgStreamOutput/>
</div>

## Filters and pipelines

An intuitive way to understand how Pipy works is to think of it as [_Unix pipelines_](https://en.wikipedia.org/wiki/Pipeline_(Unix)). The only thing that's changed from Unix pipelines is that we are dealing with streams of **events**, rather than _bytes_.

An input stream is processed through a chain of **filters** inside of Pipy. Each filter works somewhat like a tiny Unix process that reads from its input (stdin) and writes to its output (stdout). The output from one filter is connected to the input to the next.

<div style="text-align: center">
  <SvgPipelineFilters/>
</div>

A chain of filters is called a **pipeline**. There are 4 types of pipelines, categorized
by various input sources:

### Port pipeline

A **port pipeline** is created whenever there's an incoming TCP connection (or a UDP virtual session) on a listening port. It then reads **Data** events from the connection, processes them, and writes output back to the client. This resembles the widely adopted "_request and response_" communication model as in HTTP, where the input to the pipeline is the requests and the output from the pipeline is the responses. Every incoming connection to Pipy has a dedicated _port pipeline_ to it, handling the two-way communication happening in that connection.

### Timer pipeline

A **timer pipeline** can be created periodically. The pipeline can generate any sorts of inputs it needs at creation time. Whatever it outputs is simply discarded after all the processing in its filters. This type of pipeline can be used to carry out [_cron job_](https://en.wikipedia.org/wiki/Cron)-like tasks.

### Signal pipeline

A **signal pipeline** is created when a signal is received by the Pipy process. The pipeline can generate any sorts of inputs it needs at creation time. Whatever it outputs is simply discarded after all the processing in its filters. This type of pipeline is useful when certain tasks need to be carried out at the time of a signal.

### Sub-pipeline

A **sub-pipeline** is a pipeline that can be started from other pipelines by using a **joint filter**. The most basic joint filter, besides a couple of others, is [link()](/reference/api/Configuration/link). It receives events from the filter before it in its own pipeline, pumps them to a sub-pipeline for processing, reads back whatever that sub-pipeline outputs and pumps them all down to its next filter.

<div style="text-align: center">
  <SvgSubPipeline/>
</div>

One way to understand _joint filters_ and _sub-pipelines_ is compare them to _callers_ and _callees_ when calling a sub-routine in procedural programming. The input to the joint filter is like parameters to the sub-routine, and the output is like return values from it.

Unlike sub-pipelines, the other types of pipelines, namely **port pipelines**, **file pipelines**, **timer pipelines** and **signal pipelines**, cannot be _"called"_ internally from a joint filter. They can only be started from external sources. We call these pipelines **root pipelines**.

## Module

A **module** is a PipyJS source file containing scripts that define a set of **pipeline layouts**.

A **pipeline layout** tells Pipy what filters a pipeline has and in what order. Note that defining a pipeline layout in a module doesn't create any pipelines at that moment. It only tells what pipelines look like when they're actually created at runtime to handle some inputs, though in some cases, when the meaning is obvious, we use the term "_pipeline_" for "_pipeline layout_" just for brevity.

## Context

A **context** is a set of variables attached to a pipeline, used by scripts to maintain the current state of the pipeline.

All pipelines use the same set of _context variables_ inside a specific Pipy module. In other words, all contexts are of the same "_shape_" for all pipelines in a single module. But different modules can have different context shapes. When you start a Pipy module, the first thing is define the _shape_ of the context to be used in that module by calling the built-in function [pipy()](/reference/api/pipy). It tells Pipy what _context variables_ it's going to use and their initial values.

Although all pipelines from the same module have exactly the same set of _context variables_, each pipeline can have its own variable "_values_" that are isolated from others, or let's say, each pipeline can have its own "_state_".

<div style="text-align: center">
  <SvgModulePipelineContext/>
</div>

To the scripts in a module, these context variables are used like _global variables_ in other programming languages, which means that they are always accessible to the scripts from anywhere in the same module file. However, they are **NOT** the _global variables_ you know in the common sense.

"_Global variable_" usually means "_globally unique_", you should only have one single state of these variables at a given time, whereas in Pipy you can have many separate states of them depending on how many different contexts you have. If you are familiar with multi-thread programming concepts, you can correlate this to [_TLS (thread-local storage)_](https://en.wikipedia.org/wiki/Thread-local_storage), where _TLS variables_ can have different states across different _threads_, or by the terms of Pipy, _context variables_ can have different states across different _pipelines_.
